Re-implement PR-AUC. #7297

trivialfis · 2021-10-08T18:23:01Z

The new implementation supports binary/multi-class classification and learning to rank to binary relevance. Also, it handles empty datasets and irregular datasets by returning NaN instead of making an exception. Lastly, the GPU implementation has the same functionality as the CPU one instead of being ranking only.

Perf

n_samples = 1e7
runs = 16
n_classes = 8 (multi-class only)

	Master	AUCPR
CPU ROC-AUC Binary	8.385259628295898	8.928396701812744
CPU PR-AUC Binary	9.372312784194946	9.454226732254028
GPU ROC-AUC Binary	0.7207620143890381	0.711883544921875
GPU PR-AUC Binary	NA	0.8523569107055664
CPU ROC-AUC Multi	64.72866940498352	66.78449654579163
CPU PR-AUC Multi	NA	67.58437657356262
GPU ROC-AUC Multi	8.162241697311401	8.218487024307251
GPU PR-AUC Multi	NA	8.903043985366821

Related

Close #6561 .
Close #6272 .
Close #6551 .
Close #6692 .

trivialfis · 2021-10-09T14:21:55Z

LTR support is added. No breaking change.

codecov-commenter · 2021-10-09T17:25:01Z

Codecov Report

Merging #7297 (202fe96) into master (69d3b1b) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #7297   +/-   ##
=======================================
  Coverage   83.62%   83.62%           
=======================================
  Files          13       13           
  Lines        3884     3884           
=======================================
  Hits         3248     3248           
  Misses        636      636

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69d3b1b...202fe96. Read the comment docs.

RAMitchell

Am I correct that you are trying to extract the scan and reduce phase of the AUC computation, but pass lambda functions to achieve the different variations in AUC (ROC-AUC vs PR-AUC, multiclass and ranking)?

This PR probably needs some performance testing on CPU and GPU.

RAMitchell · 2021-10-10T01:55:48Z

src/metric/auc.cu

+  if (!cache) {
+    cache.reset(new DeviceAUCCache);
+  }
+  cache->Init(predts, is_multi, device);


Any reason not to use its constructor?

init can handle a changed input matrix.

trivialfis · 2021-10-10T08:07:39Z

Am I correct that you are trying to extract the scan and reduce phase of the AUC computation, but pass lambda functions to achieve the different variations in AUC (ROC-AUC vs PR-AUC, multiclass and ranking)?

Yes.

This PR probably needs some performance testing on CPU and GPU.

Will run some simple benchmarks.

trivialfis · 2021-10-10T12:07:50Z

@RAMitchell I attached some benchmark results to the PR description.

RAMitchell

Benchmarks look good, no regression for existing code.

I think I'm happy to go ahead with this, but more testing is always helpful, if you can figure out how to do it for PR-AUC.

* Support binary/multi-class classification, ranking. * Add documents. * Handle missing data.

trivialfis · 2021-10-19T08:48:11Z

Pushed a commit that prevents integer overflow inside cub for ROC-AUC.

trivialfis changed the title ~~[breaking] Re-implement PR-AUC.~~ Re-implement PR-AUC. Oct 9, 2021

trivialfis force-pushed the aucpr branch from de15b2e to 5ae7e97 Compare October 9, 2021 14:34

trivialfis marked this pull request as ready for review October 9, 2021 19:51

RAMitchell reviewed Oct 10, 2021

View reviewed changes

trivialfis mentioned this pull request Oct 10, 2021

Fix weighted samples in multi-class AUC. #7300

Merged

RAMitchell approved these changes Oct 12, 2021

View reviewed changes

Re-implement PR-AUC.

e161ed9

* Support binary/multi-class classification, ranking. * Add documents. * Handle missing data.

trivialfis force-pushed the aucpr branch from 9eff2cf to e161ed9 Compare October 12, 2021 08:20

Fix rebase.

2cdea2f

StrikerRUS mentioned this pull request Oct 12, 2021

fix behavior for default objective and metric microsoft/LightGBM#4660

Merged

trivialfis mentioned this pull request Oct 13, 2021

Avoid omp reduction in coordinate descent and aft metrics. #7316

Merged

trivialfis added 3 commits October 13, 2021 19:34

More notes.

51fa637

Typo.

62ba1d8

Prevent integer overflow in AUC.

1d77d36

trivialfis merged commit d434942 into dmlc:master Oct 26, 2021

trivialfis deleted the aucpr branch October 26, 2021 05:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-implement PR-AUC. #7297

Re-implement PR-AUC. #7297

trivialfis commented Oct 8, 2021 •

edited

Loading

trivialfis commented Oct 9, 2021

codecov-commenter commented Oct 9, 2021 •

edited

Loading

RAMitchell left a comment

RAMitchell Oct 10, 2021

trivialfis Oct 10, 2021

trivialfis commented Oct 10, 2021

trivialfis commented Oct 10, 2021

RAMitchell left a comment

trivialfis commented Oct 19, 2021

Re-implement PR-AUC. #7297

Re-implement PR-AUC. #7297

Conversation

trivialfis commented Oct 8, 2021 • edited Loading

trivialfis commented Oct 9, 2021

codecov-commenter commented Oct 9, 2021 • edited Loading

Codecov Report

RAMitchell left a comment

Choose a reason for hiding this comment

RAMitchell Oct 10, 2021

Choose a reason for hiding this comment

trivialfis Oct 10, 2021

Choose a reason for hiding this comment

trivialfis commented Oct 10, 2021

trivialfis commented Oct 10, 2021

RAMitchell left a comment

Choose a reason for hiding this comment

trivialfis commented Oct 19, 2021

trivialfis commented Oct 8, 2021 •

edited

Loading

codecov-commenter commented Oct 9, 2021 •

edited

Loading